Feat/ecsm accelerator#657
Conversation
|
Benchmark Results for modified programs 🚀
|
Review: ECSM AcceleratorThis PR adds a secp256k1 scalar-multiplication precompile ( Issues foundMedium – Off-by-7 in address-overflow check (inline comment on execution.rs:408-409) Medium – Low – Wrong expected value in assembly test comment (inline comment on test_ecsm.s:32) No issues found in
|
Codex Code ReviewFound two issues in the PR diff:
I could not run the Rust tests because rustup attempted to write under read-only |
|
Benchmark Results for unmodified programs 🚀
|
Replaces the per-operation Fermat inversions in the double-and-add replay with audited k256 (RustCrypto) projective arithmetic + batched inversion. The witness generator is untrusted (the ECDAS chip re-proves every step), so audited host-side arithmetic is sound here. - curve.rs: `replay_double_and_add` now replays the schedule in k256 `ProjectivePoint` (no per-op inversion), `batch_normalize`s every point to affine in one shot, and batch-inverts the slope denominators — two batched inversions instead of ~2·len_k Fermat modpows. The slope `λ` is precomputed here (new `StepPts.lambda` field) so the witness builder never inverts. - lib.rs: `scalar_mul_x` (executor) uses k256's optimized scalar mul directly, skipping the step list entirely. - witness.rs: `build_step` consumes the precomputed `s.lambda`. - The BigUint reference (`point_double`/`point_add`/`step_lambda`/ `replay_double_and_add_reference`) is kept `#[cfg(test)]` only — production ships k256 alone — and a parity test pins k256 == reference byte-for-byte across small/structured/large/near-order scalars. k256 is host-side only (witness gen), never in the constraint system, and was already a transitive workspace dependency. Replay micro-bench: ~5.9x faster than the BigUint reference on a 256-bit scalar. Follow-up (separate stage): port the field/curve primitives we need to drop the num-bigint reference path entirely.
Codex Code ReviewFindings
Verification I attempted targeted tests, but the toolchain could not run in this sandbox: |
| bytes.resize(32, 0); | ||
| let mut out = [0u8; 32]; | ||
| out.copy_from_slice(&bytes[..32]); | ||
| out |
There was a problem hiding this comment.
Low — silent truncation for values > 2^256
bytes.resize(32, 0) followed by bytes[..32] silently drops the high bytes if v is larger than 32 bytes. All current callers pass values validated to be < p < 2^256, so nothing is dropped today. But this is a pub function and a future caller passing an un-reduced intermediate (e.g. a product before % p) would get silently wrong output. A debug_assert!(v.bits() <= 256) at entry would catch the misuse in tests at zero release cost.
| if !ecsm_addr_ok(addr_xg, 24) | ||
| || !ecsm_addr_ok(addr_xr, 24) | ||
| || !ecsm_addr_ok(addr_k, 31) |
There was a problem hiding this comment.
Clarity — asymmetric max offsets (24 vs 31) need a comment
The different values are correct but look like a copy-paste bug at first glance:
addr_xg/addr_xruse 24: the MEMW bus addresses doublewords by their 8-byte-aligned base; the last doubleword's base is ataddr + 8*3 = addr + 24, so only that offset needs to stay within the 32-bit low limb.addr_kuses 31: EC_SCALAR issues individual byte reads ataddr_k + offsetfor each byteoffset ∈ 0..31, so the last byte's addressaddr_k + 31must fit.
A short inline comment on each call would remove the ambiguity.
| /// Multiplicative inverse via Fermat's little theorem (`p` is prime): `self^(p-2)`. | ||
| /// Returns zero for a zero input (which never occurs for valid curve arithmetic). | ||
| pub fn inv(&self) -> Fp { | ||
| Fp(self.0.modpow(&(p() - BigUint::from(2u32)), &p())) |
There was a problem hiding this comment.
Low — not constant-time
modpow is not constant-time, so the execution time of inv (and by extension, every λ computation in build_step) varies with the field values. Combined with the scalar-bit branching in replay_double_and_add, the entire ECSM computation leaks k via timing.
This is acceptable for a ZK prover (the prover is trusted with k; the verifier never observes prover wall-time), but a doc comment noting the non-CT behaviour would warn off any future caller that uses this for non-proving purposes (e.g., ECDSA nonce generation).
feat(ecsm): k256-backed witness generation (projective + batch inverse)
b2def07 to
29e2bb5
Compare
Clean up ECSM review nits
MauroToscano
left a comment
There was a problem hiding this comment.
Let's wait for tuesday for possible changes in the spec, but this lgtm
This PR adds the ECSM precompile: secp256k1 scalar multiplication
xR =(k·G).x. The multiplication is implemented in the executor (ecsm::scalar_mul_xfrom the new sharedcrypto/ecsmcrate, verified against known test vectors) and proven by three chips:MEMWreads ofxG/kand the write ofxR(timestamp-offset soaddr_xRmay aliasaddr_xG), witnessesyGand provesyG² ≡ xG³ + bthrough byte-limb convolutions (quotients
q0/q1+ 64-entry carry arrays, range-checked viaIS_BYTE/IS_HALF), checks0 < k < NandxR < pwith borrow chains, and starts the loop on the newSERVE_K/BIT/ECDASbuses.ECDASbus. Each row proves one curve double or add through the λ / xR / yR convolution relations (33-byte quotients with offsetr = 3p, 64 carries each) and consumes scalar bits from theBITbus.kthrough a self-referentialSERVE_Kchain (oneMEMWbyte read per row) and serves the set bits on theBITbus.